External comparators (ECs) are a useful emerging research methodology to complement clinical trials by providing real-world (RW) context on comparator treatments and are increasingly of interest to regulators and payers in rare diseases and their sub-populations. Indeed, in recent publications from FDA and NICE, there is an insistence on assessing the fitness-for-purpose of RW data, including both data quality (e.g., completeness, accuracy of key study variables) as well as relevance (e.g., availability of trial-like outcomes), to ensure robust and reliable results can be generated in RW evidence studies including ECs.

In light of these requirements, we aimed to understand the suitability of existing haematology-oncology RW data sources (DSs) for comparison with clinical trial data, with the intention of bringing together qualified partners to establish a research collaboration network specialized in this emerging research methodology. Literature reviews, desk research, and recent/ongoing in-house clinical studies were reviewed to identify European DSs involved in RW haematology-oncology data generation. Of 332 European DSs identified, primary market research was conducted with 46 to ascertain fitness-for-purpose, including patient counts, variable availability, data quality, and operational aspects of data access.

As detailed in Table 1 below, among the 46 sources, approximately 9,560 patients per year were recorded across 6 haematology-oncology malignancies - DLBCL, FL, CLL, MM, MCL, and MZL - spanning 11 European countries.

We found that data on patient and disease characteristics, labs, clinical outcomes, and treatment sequence were generally readily available (collected in approximately 100%, 87%, 91%, and 91% of sources, respectively). Data quality and completeness was reasonable, but mixed for response measures and prognostic factors, particularly at later lines of therapy. On the other hand, genetic markers, QoL and HCRU measures, were often not readily available (collected in only approximately 46%, 20% and 11% of sources, respectively).

Regarding operational aspects, hospitals were more willing than registries and claims databases to share patient-level data with external researchers. Most DSs reported data collection could be conducted via electronic extraction (e.g., electronic medical records, 65%) or a mix of electronic and manual methods (e.g., case report forms, 48%), which would enable the enhancement of structured EMR data with unstructured data (e.g., captured in clinical notes or pathology reports.

Though data availability across the assessed European haematology-oncology DSs is sufficient to support ECs, artificial intelligence was identified as a useful tool to help enhance RW data readily captured in within EMRs - such as genetic markers, response measures and prognostic factors - and improve the comparability of clinical trial and RW patients. Specifically, emerging machine learning (ML) methods can improve propensity score matching, increase statistical power and reduce bias in treatment effect estimates or can offer alternatives to propensity score matching, such as G-computation and double debiased ML. Moreover, natural language processing (NLP) techniques can be used to extract unstructured data from clinical notes or pathology reports, instead of relying on more burdensome manual chart review methods.

Although there is significant promise in emerging technologies for improving the data quality and operational efficiency of future haematology-oncology EC studies, the use of techniques such as NLP and ML to generate RW evidence may have limitations when used on small sample sizes in the context of rare diseases and, especially for the consumption of regulatory and payer audiences, will require data validation and compliance with information governance models to drive the fidelity and broad applicability of these techniques.

No relevant conflicts of interest to declare.

Author notes

*

Asterisk with author names denotes non-ASH members.

Sign in via your Institution